Natural Language Processing and Random Forest for Mental Health Symptom Identification Using Social Media Data

##plugins.themes.bootstrap3.article.main##

Sigit Sugara
Popon Dauni
Novianti Indah Putri
Yogi Saputra
Nana Suryana

Abstract

This study explores the implementation of machine learning models, specifically Natural Language Processing (NLP) and Random Forest, for detecting mental health symptoms based on text analysis of web-sourced data. The research addresses the challenges of analyzing highly subjective and dynamic text in social media content to identify patterns associated with anxiety, depression, and stress. The methodology involves several preprocessing steps including case folding, cleansing, language normalization, negation conversion, stopword removal, and tokenization, followed by TF-IDF weighting and Random Forest classification. The model evaluation revealed a high accuracy rate of approximately 80%, although achieving a confidence level of 75% proved challenging. This research demonstrates that despite the inherent difficulties in predicting subjectively variable text, the machine learning approaches employed show satisfactory performance in identifying mental health symptoms, offering potential for early detection and intervention systems.

##plugins.themes.bootstrap3.article.details##

How to Cite
[1]
S. Sugara, P. Dauni, N. I. Putri, Y. Saputra, and N. Suryana, “Natural Language Processing and Random Forest for Mental Health Symptom Identification Using Social Media Data”, coreid, vol. 3, no. 3, pp. 99–106, Nov. 2025.


Section
Articles

References

DataReportal, “Digital 2023: Global Overview.” 2023.

A. A. Soebroto, Buku Ajar AI, Machine Learning & Deep Learning. 2019. [Online]. Available: https://www.researchgate.net/publication/348003841

M. Ramadanti and C. P. Sary, Psikologi Kognitif: Suatu Kajian Proses Mental dan Pikiran Manusia.

M. Irfan, P. S. Dewi, W. B. Zulfikar, C. Slamet, and I. Taufik, “Sentiment Analysis as Assessment of the COVID-19 Social Assistance Pollemic using Random Forest Algorithm,” Proceeding 2022 8th Int. Conf. Wirel. Telemat. ICWT 2022, no. December 2020, 2022, doi: 10.1109/ICWT55831.2022.9935483.

E. R. B. Sebayang and Y. H. Chrisnanto, “Klasifikasi Data Kesehatan Mental di Industri Teknologi Menggunakan Algoritma Random Forest,” 2023, [Online]. Available: http://ijespgjournal.org

N. I. Putri, Y. Saputra, S. Nurhayati, and D. Dzarwah, “Sistem Pendukung Keputusan Pemilihan Program Studi Calon Mahasiswa Menggunakan Weighted Product ( WP ),” vol. 10, no. 2, pp. 322–327, 2023.

D. Zhang et al., “A Full-Stack Search Technique for Domain Optimized Deep Learning Accelerators,” Int. Conf. Archit. Support Program. Lang. Oper. Syst. - ASPLOS, pp. 27–42, 2022, doi: 10.1145/3503222.3507767.

Styawati, A. Nurkholis, F. A. Ans, S. Alim, L. Andraini, and R. A. Prasetyo, “Web Scraping for Summarization of Freelance Job Website Using Vector Space Model,” in 2023 IEEE 9th Information Technology International Seminar (ITIS), 2023, pp. 1–5. doi: 10.1109/ITIS59651.2023.10420412.

D. B. O’Connor, J. F. Thayer, and K. Vedhara, “Stress and Health: A Review of Psychobiological Processes,” 2021, [Online]. Available: http://www.annualreviews.org

R. Davila-Campos, M. Mora, P. Y. Reyes-Delgado, J. Munoz-Arteaga, and G. C. Lopez-Torres, “The Landscape of Rigorous and Agile Software Development Life Cycles (SDLCs) for BPMS,” IEEE Access, vol. 12, pp. 57519–57547, 2024, doi: 10.1109/ACCESS.2024.3386167.

A. R. Atmadja, A. Rahmawati, C. N. Alam, P. Dauni, and Y. Saputra, “Sentiment Analysis on Tourism Place using Naive Bayes,” Proceeding 2023 17th Int. Conf. Telecommun. Syst. Serv. Appl. TSSA 2023, pp. 1–6, 2023, doi: 10.1109/TSSA59948.2023.10366891.

S. M. Shah, “Mental illness detection through harvesting social media: a comprehensive literature review,” PeerJ Comput. Sci., 2024, doi: 10.7717/peerj-cs.2296.

S. J. T. Zhang K. Yang and S. Ananiadou, “Emotion fusion for mental illness detection from social media: A survey,” Inf. Fusion, vol. 92, pp. 231–246, 2023, doi: 10.1016/j.inffus.2022.11.031.

M. Malgaroli, T. D. Hull, J. M. Zech, and T. Althoff, “Natural language processing for mental health interventions: a systematic review and research framework,” Transl. Psychiatry, vol. 13, p. 309, 2023, doi: 10.1038/s41398-023-02592-2.

A. Purnomo, “Impementasi Web Scraping Pada OJS Dengan Metode CSS Selector,” RESOLUSI Rekayasa Tek. Inform. dan Inf. , vol. 3, no. 2, pp. 37–42, 2022, [Online]. Available: https://djournals.com/resolusi

R. T. Handayanto and Herlawati, Data Mining dan Machine Learning Menggunakan MATLAB dan Python. 2020.

F. Sidik, I. Suhada, A. H. Anwar, and F. N. Hasan, “Analisis Sentimen Terhadap Pembelajaran Daring Dengan Algoritma Naive Bayes Classifier,” J. Linguist. Komputasional, vol. 5, no. 1, p. 34, 2022, doi: 10.26418/jlk.v5i1.79.

E. Prayitno, T. Suprawoto, and ..., “Optimasi Hasil Pencarian Pada Web Scrapping Menggunakan Pembobotan Kata Tf-Idf,” J. Innov. Res. Knowl., vol. 1, no. 7, pp. 241–246, 2021, [Online]. Available: https://bajangjournal.com/index.php/JIRK/article/view/822